Data Science — Urban Forest Risk Assessment - Sprint 1 complete - Sprint 2 In Progress (20%) by aidanuni · Pull Request #1721 · Chameleon-company/MOP-Code

aidanuni · 2026-04-02T03:21:27Z

Layer 1 — Data Pipeline & Feature Engineering (complete)

Collected and cleaned 5 datasets (trees, microclimate sensors, BOM weather, soil sensors)
Performed spatial joins linking 82k trees to nearest sensors
Engineered weather features (rolling averages, heatwave flags, drought indicators)
Assembled final feature table (82,064 rows, 22 features)

Layer 2 — ML Risk Scoring (in progress)

Data preparation for modelling started

Initial setup: - "playground" folder organisation - requirements file for virtual environment - notebook test (to check vscode config

EDA for all datasets being used. gitignore for data and venv files

All datasets cleaned and CRS aligned

Trees linked to nearest microclimate and soil sensors

- Engineered weather features (rolling averages, drought, heatwave) - Assembled everything into a feature table for ML model next

- Initial setup - Data preparation for ML risk scoring

manya0033

Hey @aidanuni thanks for walking me through Sprint 1 - the structure is clean and I can see a lot of thought went into the feature engineering (rolling temp averages, heatwave flags, consecutive hot days is a nice touch). Before I approve, a few things to address:

The CRS warning in 03_spatial_joins.ipynb is still showing because EPSG:7844 is actually a geographic CRS (GDA2020 lat/lon), not a projected one. You can see the effect in the distance stats, the column is labelled sensor_distance_m but the values range from 0.00002 to 0.05, which can't be metres for 82k trees. They're actually still in degrees. Could you switch to a projected CRS like EPSG:7855 (GDA2020 / MGA zone 55) before the sjoin_nearest calls? That'll give you genuine metre distances and the warning will go away.
The notebooks reference local paths like ../data/processed/feature_table.csv but I can't see where the raw data is actually coming from. If you're pulling the datasets from the Melbourne Open Data portal via API v2.1, could you include those API calls directly in the notebook so reviewers can reproduce the pipeline end-to-end? Otherwise, if any of the data is external, a CSV version should go in the DEPENDENCIES folder as per the checklist.
One suggestion, would it be possible to consolidate everything into a single notebook rather than five separate ones? Our use cases are meant to read as step-by-step tutorials, and having the whole pipeline (exploration -> cleaning -> spatial joins ->feature engineering -> ML model) in one notebook with clear markdown headers between sections would make it much easier for readers to follow along and reproduce. It also avoids the issue of needing intermediate files saved to disk between notebooks.
A short README in your project folder would also help- just a few lines on the pipeline order, where the raw data comes from, and any setup notes.

No worries that the ML section is mostly empty, the PR title is clear that Sprint 2 is 20% in progress. Happy to re-review once those are sorted!

aidanuni added 7 commits March 17, 2026 12:06

Initial Project Setup

455ae96

Initial setup: - "playground" folder organisation - requirements file for virtual environment - notebook test (to check vscode config

EDA for all datasets

6f92a6f

EDA for all datasets being used. gitignore for data and venv files

Data Cleaning

ee83819

All datasets cleaned and CRS aligned

Spatial Joins

d6dbdd2

Trees linked to nearest microclimate and soil sensors

feature engineering

ae4cd0c

- Engineered weather features (rolling averages, drought, heatwave) - Assembled everything into a feature table for ML model next

Create 05_ml_model.ipynb

e7753ae

- Initial setup - Data preparation for ML risk scoring

Merge branch 'master' into AidanPage_T126

86dfa96

aidanuni requested review from HarshanaDeakinUni and manya0033 April 2, 2026 03:21

aidanuni changed the title ~~AidanPage_T126 — Urban Forest Risk Assessment Layer 1 & 2 Progress~~ Data Science — Urban Forest Risk Assessment - Sprint 1 complete - Sprint 2 In Progress (20%) Apr 2, 2026

manya0033 requested changes Apr 14, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Data Science — Urban Forest Risk Assessment - Sprint 1 complete - Sprint 2 In Progress (20%)#1721

Data Science — Urban Forest Risk Assessment - Sprint 1 complete - Sprint 2 In Progress (20%)#1721
aidanuni wants to merge 7 commits intomasterfrom
AidanPage_T126

aidanuni commented Apr 2, 2026

Uh oh!

manya0033 left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

aidanuni commented Apr 2, 2026

Uh oh!

manya0033 left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants